XXXI. — On  Inheritance  of  Hair  and  Eye  Colour. 
By  John  Brownlee,  M.D.,  D.Sc. 

(MS.  received  June  17,  1912.  Read  same  date.) 


Some  time  ago,  in  a paper  published  by  the  Royal  Anthropological  Institute, 
I applied  a Mendelian  analysis  to  that  part  of  the  observations  made  by  the 
late  Dr  Beddoe  (1)  which  refers  to  the  colour  of  the  hair.  In  that  paper  (2) 
I showed  that  these  observations  obeyed  in  a highly  remarkable  degree 
the  law  referred  to,  and  that  this  result  held  from  the  north  of  Scotland, 
through  the  whole  of  England,  Ireland,  France,  and  Germany,  to  the  south 
of  Italy.  At  that  time  I was  unable  to  make  any  application  to  the 
observations  on  eye  colour  also  published  in  the  same  work,  but  I have  now 
succeeded  in  completing  the  analysis. 

The  whole  depends  on  a theorem  of  population  stability  which  may  be 
easily  proved. 

Let  the  population  consist  of  a mixture  of  two  races  having  two 
characters  such  as  hair  colour  and  eye  colour  inherited  according  to  the 
Mendelian  law  of  segregation.  Let  these  qualities  be  denoted  by  (BB), 
(bb)  for  the  hair,  and  (DD),  (dd)  for  the  eyes.  Then  the  population  may 
be  considered  given  by 
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If  this  population  mate  freely,  and  if  all  matings  possess  equal  fertility, 
the  relationship  of  the  constants  required  for  a stable  population  depends 
on  whether  coupling  exists  or  not. 

The  meaning  of  the  term  “ coupling  ” may  be  easily  seen  from  a considera- 
tion of  the  different  units  in  the  above  expression.  It  will  be  noticed  that 
every  term  of  the  expression  except  that  in  the  middle  has  either  two  eye 
units  or  two  hair  units  the  same.  It  is  thus  impossible  when  division  takes 

* The  factors  outside  the  brackets  are  the  proportional  numbers  of  each  variety.  The 
simple  case  is  : if  x(A,  A)  mate  at  random  with  itself  and  with  y(a,  a)  and  all  subsequent 
matings  are  equally  probable,  the  stable  population  is  given 

x-{A,  A)  + 2xy(A,  a)  + y^{a,  a). 
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place  for  anything  else  to  occur  than  that  two  constantly  linked  pairs  are 


given  off  in  equal  numbers.  Thus 
But  when  we  consider  the  case  of 


dD 

hh 

m 

Bb 


can  only  divide  into  and 
b 


D 

b ■ 


other  things  may  easily  happen. 


If  D have  a greater  affinity  for  B than  for  b,  then  we  may  have  more  of  the 


element 


D 

B 


given  off  than  of  the  element 


D 


.*  But  here  also  there  is  a 


necessary  arithmetical  relationship  between  the  different  elements  resulting, 


and  if  n 
d 


D 

B 


n 


for  one 


elements  occur  for  one 
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it  follows  that  there  will  also  be 


even  although  the  attraction  of  D for  B micrht  be  different 
B ^ » 


from  that  of  d for  b. 

If  the  population  (1)  mate  freely  and  if  in 


occur  with  n 


D 


(where 


m-\-n  = -b  and  2(c6d  + 6c)  is  denoted  by  h),  the  next  generation  will  be 
given  by 
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This  has  exactly  the  same  form  as  that  from  which  it  is  derived,  but 
the  relative  proportions  of  the  different  classes  may  be  different.  If  the 
population  is  stable  we  have  as  the  sufficient  conditions, 

(a^  + ah-\r  ac  + mh)'^  _ (6^  4.  5^  + 6d  + nh)-  (c-  + cd  + ca  + 7th)'^  (d-  + dh  + dc  + Tnh)'^ 

o2  “ p ^ > 

as  all  the  similar  relationships  hold  if  these  are  true. 

Taking  the  first  equation 

(«2  + ah  + ac  + 7nhY  (h-  + ha  + hd  + nh)^ 

= p ’ 


* The  assumption  made  here  is  that  there  is  no  special  mortality  or  instability  among 
the  pairs  which  are  actually  formed. 
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we  have,  since  the  positive  root  must  be  taken, 
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6+_  =f^  + — , 
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abc  + 2mabd  + 2mb^c  = abd  + 2na?d  + 2nabc, 

since  h = 2{ad  + be) ; 
= abd  + ( 1 - 2m)ofid  + ( 1 - 2m)abc, 

.since  2{m  + «)  = 1 ; 

i{abd  + b^c  + a"^d  + abc)  = abd  + a‘^d, 


2m{a  + b){ad  + be)  = (a  + b)ad 
2m{ad  + be)  = ad 

2mbc  = ( 1 - 2m)ad 
= 2nad. 

The  other  equations  also  reduce  to  this,  so  that 

ad  m 
be  n 

is  the  criterion  of  stability  if  coupling  exists.  If  there  is  no  coupling, 

m = n and  ad  = be. 


Some  remarks  may  be  made  in  this  place  concerning  the  meaning  of 
coupling.  It  has  two  forms ; either  each  unit  has  a special  attraction  for 
the  corresponding  unit  originally  associated  with  it,  or  on  the  other  hand 
for  the  one  with  which  it  has  come  in  contact  when  hybridisation  occurs. 
The  theory  at  present  advanced  by  Mendelian  biologists  makes  in  my 

notation — =2''  — ! when  p is  a positive  integer.  I confess  that  I cannot 

'If}/ 

follow  the  arguments  on  which  this  is  based.  The  facts  seem  to  me  much 
more  in  line  with  the  conditions  of  stability  in  chemical  solutions.  If 
there  be  a solution,  say,  of  NagSO^  and  HCl,  the  relative  proportions  of  the 
four  possible  substances  depend  on  the  rate  at  which  the  reactions  between 
Na2S04  and  HCl  and  between  NaCl  and  H2SO4  take  place.  Denoting  these 
respectively  by  n and  m,  if  the  amount  of  these  four  substances  be  respec- 
tively a,  d,  b,  c,  equilibrium  will  exist  if  nad  = mbc.  Or,  in  other  words,  the 
equation  of  chemical  equilibrium  is  the  same  as  that  of  the  stability  of  the 
population  considered.  The  advantage  of  this  method  of  looking  at  the 
matter  is  that  it  implies  no  special  values  of  m and  n.  Short,  therefore,  of 

some  fundamental  reason  for  the  value  — =2'  — 1,  it  is  better  to  consider 

n 

that  other  values  may  be  possible  and  that  facts  on  one  side  or  the  other 
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are  at  present  of  more  importance  than  theories.  The  only  difference  in 
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this  case  is  that  either 
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or 
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must  exist;  thus  four  different 


compounds  cannot  all  appear  together,  but  if  an  average  of  a large  number 
of  examples  is  taken  the  result  must  be  the  same. 

Referring  back  to  the  expression  for  a freely  mating  population,  we  see 
that  the  fact  that  it  forms  a perfect  square  is  not  a sufficient  criterion  of 
stability.  All  that  is  stable  is  the  relation  of  the  eyes  alone  or  of  the  hair 
alone.  Thus,  taking  formula  (2)  and  summing  each  line  as  regards  number, 
we  have  for  the  total  of  the  first  line,  or  the  terms  containing  (I)D), 

(a^  + ah  + ac  + mh)^  + 2(a-  + aJ>  + ac  + 7nh){Jr  + ab  + hil  + nh)  + {h-  + ah  + hd  + nltY, 
or 

(«2  + ab  + ac-\-  mil  + h-  + ah  + hd  + nhY, 


or 


or 


( a~  + ah  + ac  ah  -f*  he  -f*  hdy^j 

since  m + ?)  = -5 
and  h = 2ad  + 2hc  ; 


{a  + hy{a  + 6 + c + d)2 ; 

the  second  line,  i.e.  the  terms  containing  (Dd),  is  equal  to 

2(a  + 6)(c  + d)(  a + /y  + c + d)2, 

and  the  third  to 

(c  + d)2(a  + 5 + c + dY, 

and  the  proportions  of  the  original  population  (1)  are  exactly  maintained. 
Shortly  written  as  before  shown,  the  general  formula  may  be  denoted  by 
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This  is  the  typical  stable  Mendelian  population  without  coupling  if 


ad^hc;  if  coupling  exists,  is  tlie  criterion,  and  stability  in  the 

population  is  only  established  after  many  generations. 

Suppose  equal  numbers  of  two  populations  mix  and  mating  is  free : 
suppose  also  that  the  coupling  ratio  is  7,  one  actually  found  by  Bateson 
and  Punnett  (3).  Then  if  mating  is  free  the  first  generation  will  be  given 
by  the  ratio 
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With  a ratio  of  7 the  next  generation  will  be  represented  by 
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which  when  expanded  gives 
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Hence  — = 225  in  place  of  7. 
n ^ 

The  subsequent  matings  can  be  easily  calculated  by  the  application  of 

7 

the  form  in  expression  (2).  The  first  term  is  (225  + 15  + 15  + jg'452p,  and 
the  rest  are  found  likewise. 

Applying  the  process  seriatim  with  suitable  approximations  we  have 
CtCi 

the  successive  values  of  given  in  Table  I. 


Table  I. 


T p Q/d  Tfb 

Value  01  ^ — or  — . 

be  n 

After  first  generation 

225 

,,  second  „ ... 

56 

„ third  „ ... 

27 

„ fourth  „ . . 

19 

„ fifth  „ ... 

14 

„ sixth  „ ... 

11 

,,  seventh  „ ... 

9-6 

„ eighth  „ ... 

8-6 

„ ninth  ,,  ... 

8-3 

It  is  thus  seen  that  stability  is  attained  only  after  a considerable  number 
of  generations  in  a free-mating  population  if  coupling  exists. 

It^is  possible  to  introduce  a shortened  notation.  In  all  circumstances 
these  populations  after  one  generation  consist  of  numbers  which  are  those 
of  a perfect  square.  If  we  write  this  in  the  following  way  we  can  at  once 
proceed  to  the  full  expression  with  little  trouble. 
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Each  of  the  four  sides  of  the  complete  expression  is  the  square  of  the 
corresponding  terms  of  the  contracted  expression,  and  the  term  in  the 
middle  the  sum  of  twice  the  product  of  the  diagonal  elements. 

One  or  two  other  examples  of  the  rate  at  which  stability  is  approached 
in  one  generation  are  shown  in  the  following  table : — 


Table  II.,  showing  Rate  of  Approximation  to  a Stable  Population. 
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Before  quoting  any  examples  of  Dr  Beddoe’s  figures  it  will  be  well  to 
state  clearly  his  hair  and  ej^e  categories.  He  recognises  five  types  of  hair 
colour.  The  meanings  of  these  types  seem  to  me  as  follows : — 

(1)  Jet  black. — This  is  a true  single  hue,  and  persons  possessing  this 
colour  of  hair  are  with  few  exceptions  those  who  possess  two  distinct  pure- 
black  elements  in  the  gametes.  The  exceptions,  so  far  as  I have  seen,  are 
a few  persons  who  have  one  red  and  one  jet-black  element.  In  manhood 
this  may  resemble  jet  black  very  closely,  but  the  colour  of  the  hair  on  the 
body  usually  shows  some  trace  of  the  ruddy  pigment.  These  are,  however, 
so  few  in  number  that  they  do  not  disturb  the  calculation. 

(2)  Dark  hair. — This  is  really  a mixture  consisting  of  one  jet-black 
element  and  one  element  of  either  medium  hair  or  fair  hair.  Black  is 
thus  imperfectly  dominant. 

(3)  Brown  hair. — This  consists  of  those  who  are  true  brown  or  medium 
and  of  those  who  possess  one  brown  element  and  one  fair  element. 

(4)  Fair  hair. — This  again  is  a pure  pigment,  the  person  possessing  it 
having  two  fair  elements. 

(5)  Red  hair. — In  this  group  are  included  the  pure  reds,  the  mixtures 
of  red  and  fair  hair,  and  the  mixtures  of  red  and  brown  hair. 

For  purposes  of  analysis  it  is  necessary  to  combine  the  last  three  classes. 

Eyes  are  more  difficult.*  Dr  Beddoe  recognises  three  classes : — 

(1)  Light  eyes. — This  includes,  in  my  opinion,  the  pure  blue,  the  grey 
or  pale  yellow,  and  the  mixture  of  these.  All  are  distinct  varieties,  and  can 
be  distinguished  with  fair  accuracy  after  a certain  amount  of  practice. 

(2)  Mixed  eyes. — This  class  contains  a certain  proportion  of  those  eyes 
which  are  a mixture  of  the  shades  of  eye  just  mentioned  and  of  the 
chocolate  and  dark-yellow  eyes. 

(3)  Dark  eyes. — This  class  contains  all  the  pure-dark  eyes  and  I think 
the  pure-yellow  eyes,  as,  on  account  of  the  manner  in  which  the  dark 
pigment  of  the  back  of  the  iris  frequently  shows  both  internally  and 
externally,  these  may  look  dark  except  on  careful  inspection.  It  also 
contains  many  eyes  which  a moment’s  careful  inspection  would  show  to  be 
either  mixed  dark  and  grey  or  dark  and  blue  eyes.  The  latter  types  of 
eye  are  much  more  common  than  the  true  dark  or  chocolate  eye.  That 
they  have  not  been  more  definitely  distinguished  is  somewhat  surprising. 

It  is  obvious  from  what  has  been  said  that  the  last  two  classes  must  at 
least  in  the  first  instance  be  placed  together. 

We  thus  have  six  equations  to  determine  four  unknown  quantities. 
The  success  of  this  fitting  must  be  the  test  of  the  truth  of  these  statements. 

* See  Appendix. 
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As  an  example,  the  figure.s  Dr  Beddoe  obtained  by  observation  in  the 
town  of  Caen  are  given.  The  numbers  are  as  follows : — 


Light,  Medium 
and  Red  Hair. 

Dark  Hair. 

Jet-black  Hair. 

Total. 

Light  eyes  .... 
Mixed  and  dark  eyes 

/ 149-5* 

1 (149-5) 
51-5* 

27 

(27-15) 

93-5* 

1-5*  \ 

(1-11)  / 
16 

178 

161 

Total 

/ 201 
1 (201) 

120-5* 

(120-0) 

17-5*  \ 

(17-9)  / 

339 

In  this  case  a^  = 149'5  and  (a  + 6)'^  = l78.  This  gives  on  solution  12'23 
and  6 = 1T1,  so  that  we  obtain  2(X&  = 27T.5  as  against  27  found  and  6-  = l'23 
as  against  1'5  found.  Whether  we  regard  the  1'5  as  really  one  individual  or 
two,  the  fit  is  exceedingly  good.  The  same  process  applied  to  the  total  gives 
(c6  + c)’^  = 201  and  (a  + 6 + c + d)2  = 339,  so  that  (a  + c)  = 14T8  and  (6  + cZ)  = 
4'23,  which  give  2(a  + c)  (6 + cZ)  = 120  as  against  120’o  and  (6  + cZ)^  = l7’9  as 
against  17 ‘5  found. 

This  example  illustrates  how  the  race  mixture  can  be  analysed  and  the 
closeness  with  which  the  numbers  accord  with  such  distribution  of  the 
population  as  is  given  by  the  Mendelian  theory.  Such  complete  correspond- 
ence is  of  course  rare.  Another  example  almost  equally  good  is  that  of 
Bradford.  Here  the  numbers  are  even  larger,  the  sample  of  the  population 
observed  numbering  1400  persons.  In  this  case  the  theoretical  numbers  are 
printed  in  brackets  above  the  actual : — 


Light,  Medium 
and  Red  Hair. 

Dark  Hair. 

Jet-black  Hair. 

Total. 

Light  eyes  .... 

/ (663) 

( 663 

(117-8) 

117 

(5-24)  1 

786 

Total  (all  eyes) 

r (968) 
\ 968 

(392-4) 

387 

(39-6)  1 

45  / 

1 

1400  j 

The  method  of  testing  the  suitability  of  such  fitting  is  that  given  by 
Professor  Pearson  (4).  The  differences  are  taken  between  each  theoretical  and 
actual  number ; these  are  squared,  divided  by  the  corresponding  theoretical 

number,  and  summed.  In  the  case  of  the  totals  this  is  equal  to 

' 39’6  392-4 

or  -81. 

Where  -5  occurs,  the  indications  were  so  nearly  equal  that  the  individual  was  recorded 
half  in  one  class  and  half  in  another. 
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This  sum  is  denoted  by  the  symbol  ; the  value  of  P or  the  probability 
that  the  fit  might  be  worse  is  then  obtained  from  the  published  tables  (5). 
For  the  above  figures  P = '67;  that  is,  if  1400  persons  were  observed  by 
random  sampling  100  times,  in  67  of  these  a worse  fit  might  be  expected 
than  that  found.  In  the  case  of  the  upper  line  the  fit  is  practically 
perfect. 

In  what  follows,  the  figures  relating  to  Scotland  are  chiefly  used.  Con- 
cerning the  suitability  of  these  it  may  be  remarked  that  (excluding  Glasgow 
and  Edinburgh,  where  the  recent  Irish  immigrations  have  introduced  a 
large  element  unassimilable  on  account  of  the  difference  in  religion,  and 
which  therefore  fulfil  none  of  the  conditions  necessary  to  the  application 
of  the  present  theory).  Dr  Beddoe  made  observations  in  43  localities  in 
which  the  characteristics  of  the  hair  and  eyes  were  noted  in  more  than 
150  persons. 

If  43  cases  are  noted  at  random,  the  number  of  good  fits  and  bad  fits 
may  easily  be  calculated  from  the  probability  table  already  referred  to. 
We  find  should  be  less  than  unity  in  ‘393  of  the  cases;  greater  than 
unity  and  less  than  two  in  ‘239  ; greater  than  two  and  less  than  three  in 
T25;  and  greater  than  three  in  the  remainder,  namely,  '223.  The 
following  table  is  divided  into  two  classes — the  towns  with  the  larger 
districts,  and  the  country  districts.  It  is  seen  that  the  number  expected  not 
only  is  realised  but  largely  exceeded ; in  other  words,  except  for  the  fact 
that  the  number  of  towns  and  large  districts  in  which  is  greater  than 
three  is  twice  that  expected,  the  number  of  small  values  of  is  much  in 
excess  of  that  required.  The  exception  is  to  be  expected  as  into  these 
towns  specially  the  immigration  has  been  much  the  greatest  in  recent 
years. 


Table  III.,  showing  the  Distribution  of  the  Forty-three  Districts 
IN  Scotland  according  to  the  Actual  Findings  and  the 
Theoretical  Proportions  expected  by  the  Theory  of  Chance. 


Values  of  x^- 

0-1. 

1-2. 

2-3. 

3-. 

Towns  and  / A ctual  . 
large  districts  \ Theoretical 

Small  districts  1 , 

\ Iheoretical 

6 

5-1 

2 

3-2 

1-9 

5 

2-9 

22 

11-7 

3 

7-1 

2 

4-4 

3 

6-7 

m i 1 r A.CtX.13,1  ... 

loiai  . 

28 

11-7 

5 

10-3 

2 

6-3 

8 

9-7 
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For  comparison  of  hair  and  eyes  a further  selection  has  been  made. 
Only  those  towns  and  districts  in  which  y-  is  less  than  unity  have  been 
analysed,  as  it  is  only  in  those  we  can  expect  sufficient  freedom  of  mating 
to  allow  of  the  degree  of  coupling  being  determined. 

These  number  twenty-seven  in  all.  An  analysis  has  been  made  in  the 
manner  already  indicated.  The  values  of  a and  b have  been  determined 
fi’om  the  numbers  showing  the  combinations  of  hair  and  light  eyes,  and 
the  values  of  a + c and  h + d similarly  from  the  totals  of  each  colour  of  hair 
when  all  eyes  are  grouped  together.  The  four  elements  of  the  population 
are  thus  found,  and  a,  h,  c,  d being  thus  known,  the  ratio  of  ad  to  be  may 
be  calculated  and  the  degree  of  coupling  of  the  eyes  and  hair  known. 

In  the  adjoining  table  these  values  are  given.  The  numbers  of  pei’sons 
observed  and  the  probable  proportions  in  which  the  present  population  is 
derived  from  the  three  great  races  of  Europe  are  given  for  comparison  : — 


Table  IV.,  showing  the  Constitution  of  the  Population  in  Different  District.s 
IN  Scotland,  with  Dr  Beddoe’s  Reference  Numbers. 


No.  of 
Persons. 

Teutonic 

Race. 

Alpine 

Race. 

Mediter- 

ranean 

Race. 

D 

B 

1 

1 b 

d 

B' 

d 
1 b 

R=- 

n 

<1 

15.  Beauly,  etc. 

170 

47 

32 

21 

7-3 

1-3 

•7 

•7 

5-6 

<1 

16.  Inverness  town 

200 

32 

42 

26 

6-5 

1-4 

•9 

1-2 

6-2 

<1 

18.  .,  district 

500 

38 

39 

23 

7'0 

1-3 

7 

ro 

7*7 

<1 

19.  Keith,  etc. 

200 

36 

40 

24 

6 7 

1-18 

•62 

113 

9-8 

<1  ' 

20b.  Forres  . 

210 

37 

46 

17 

7-4 

■9 

•9 

•8 

7 3 

30.  Kirkcaldy,  etc. 

300 

44 

39 

17 

7-4 

■9 

•9 

•8 

7-3 

<1  ' 

34.  Perth 

665 

42 

36 

22 

7 

1 

•83 

1-17 

9-8 

<1  ; 

37.  Auchterarder  . 

180 

43 

33 

24 

6-8 

1-1 

•8 

1-28 

9-9 

2^5  1 

38.  Forteviot  . 

300 

42 

35 

23 

6-4 

8 

1-37 

1-43 

8-3 

1-2 

40.  Callander. 

150 

37 

37 

26 

7-0 

1 *t) 

•4 

1-0 

10-9 

<1 

47.  Breadalbane 

199 

41 

30 

29 

6’5 

1-6 

•69 

1-3 

8-8 

<1 

51.  Athol 

290 

39 

39 

22 

7-1 

1-1 

73 

1-07 

9 5 

<1 

57.  Great  Glen 

200 

44 

38 

18 

7-5 

1-2 

•66 

•64 

6-0 

<1 

72.  Ayr  . 

500 

42 

36 

22 

7-2 

1-2 

•69 

•99 

9-1 

2-3 

73.  iMaybole  . 

250 

39 

41 

20 

7-4 

1 

•67 

•93 

10-3 

<1 

74.  Sanquhar. 

200 

36 

41 

23 

9 7 

17 

1-25 

1-76 

7-8 

<1 

70.  Upper  Galloway 

250 

38 

41 

21 

6-99 

1-17 

•93 

•91 

5-8 

< 

78.  Dumfries  . 

200 

39 

42 

19 

6-9 

•9 

1-25 

•95 

4-6 

<1 

86.  Leith,  etc. 

200 

46 

37 

17 

7-8 

7 

•48 

ro2 

23-3 

< 1 

88.  Dunbar 

150 

42 

44 

14 

9-6 

•9 

•89 

•85 

10-2 

1-2 

89.  Midlothian 

300 

54 

32 

14 

7-8 

•8 

76 

•64 

8-2 

<1 

90.  Newhaven 

176 

52 

33 

15 

97 

•8 

1-48 

1-32 

10-2 

7 

100.  Duns 

230 

48 

42 

10 

7-4 

5 

1 -65 

•45 

4-1 

< 1 

109.  Jedburgh  . 

150 

44 

39 

17 

8-8 

•6 

1-27 

1-57 

18-1 

<1 

115.  Kulewater,  etc.  . 

180 

47 

38 

15 

10-1  • 

■8 

1-32 

1-2 

11-4 

<1 

116.  Teviotdale 

272 

44 

40 

16 

7-45 

7 

1 

•83 

8-8 

<1 

117.  Langholm 

200 

44 

42 

14 

7-33 

•61 

1-24 

•82 

7-8 

<1 

It  is  seen  on  inspection  that  in  these  twenty-seven  cases  the  degree  of 
successful  fitting  when  the  persons  with  light  eyes  are  considered  is  ex- 
ceedingly good.  In  twenty -two  cases  is  less  than  unity  as  against  10'6 
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expected ; but  as  there  must  be  some  correlation  between  the  two  sets  of 
results,  the  great  excess  is  not  unexpected.  In  two  cases  is  between 
1 and  2,  and  in  two  between  2 and  3.  In  two  of  the  latter  cases 
the  presence  of  a single  individual  would  make  the  fit  good,  and  only  one 
individual  could  be  expected  considering  the  small  numbers  observed. 
It  may  be  taken,  then,  that  in  these  twenty-seven  districts  at  the  present 
moment  the  conditions  for  the  applications  of  the  theory  may  be  held 
to  exist. 

In  the  table  just  given  the  value  of  the  ratio  ad : he  is  stated  in  each 
case.  For  convenience  it  will  in  future  be  noted  by  the  letter  R It  has 
a wide  range  of  variation  in  value.  The  lowest  value  is  4T  and  the 
highest  23 ’3  ; but  of  the  twenty-one  different  values  eighteen  lie  between 
7 and  11.  The  mean  is  9'14,  and  the  probable  error  of  this  ±’48. 
A number,  however,  such  as  the  ratio  at  present  considered  has  for  each 
individual  observation  a very  high  probable  error.  I have  been  unable  to 
evaluate  the  expression  for  the  probable  error  of  R in  terms  of  the  frequencies, 
and  it  is  difficult  to  make  a reliable  estimate  of  this  ; but  by  an  application 
of  the  formula  given  by  Mr  Udny  Yule  (6)  for  the  probable  error  of  the  same 
ratio  in  the  fourfold  division,  it  must  be  large.  The  average  number  of 
observations  in  each  case  does  not  much  exceed  two  hundred,  and,  taking 
this  value  and  making  a rough  estimate,  it  would  seem  that  the  probable 
error  when  R = 9 is  2.  That  is  to  say,  that  in  half  the  cases  R should  lie 
between  7 and  11.  As  we  have  just  seen,  two-thirds  lie  in  this  interval. 
When  these  ratios  are  considered  from  the  point  of  view  of  the  median  it 
is  found  that  the  latter  lies  almost  exactly  in  the  same  place.  As  small 
values  of  the  ratio  are  just  as  likely  to  arise  from  emigration  as  large 
values  from  immigration,  it  therefore  seems  probable  that  the  number  9 
approximately  represents  the  value  of  the  ratio.  The  only  value  which 
is  possible  on  the  current  theory  of  Mendelism  is  7j  namely  2^-1.  The 
observations  do  not  favour  this  value,  so  that  the  latter  cannot  be  taken  with 
reasonable  probability. 

Leaving  Scotland  for  further  verification,  it  seems  best  to  take  only 
large  numbers.  Dr  Beddoe  gives  eight  instances  in  which  the  cidteria 
demanded  in  Scotland  approximately  hold,  and  in  which  the  numbers 
observed  are  upwards  of  four  hundred.  These  are  collected  in  Table  V . , 
p.  469. 

The  mean  value  of  R in  the  case  of  these  towns  is  9‘4,  with  a probable 
error  of  ±'67,  so  that  they  show  no  certain  difference  from  the  result 
obtained.  If  anything,  they  render  the  value  7 obtained  by  the 
Mendelians  less  probable.  In  the  absence  of  other  evidence  we  may  take 
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it  that  R = 9,  and  that  if  it  differs  much  from  that,  it  is  in  excess  rather 
than  in  defect. 

As  the  result  of  these  calculations  it  is  seen  that  if  we  take  collectively 
all  those  with  light  eyes  and  distribute  them  according  to  the  colour  of  the 
hair,  the  number  of  those  with  dark  hair  is  always  equal  to  twice  the 
product  of  the  square  roots  of  the  numbers  of  those  possessing  light  hair 


Table  V.,  showing  the  Values  of  R in  several  Large  Towns 
AND  Districts. 


Reference  to 

Place. 

Number  of 

R. 

Races  of  Britain. 

Observations. 

Page  162 

Manchester 

475 

9 

„ 179 

St  Austell 

850 

8-6 

„ 180 

Truro 

500 

10-3 

„ 183 

Gloucester 

500 

10 

„ 177 
„ 163 

Chippenham 

Bram'ord 

650 

1400 

6-8 

8-4 

.,  199 

Bourges 

420 

10-8 

„ 212 

Vienna 

1700 

10-8 

and  black  hair.  The  proportions  in  which  the  eyes  are  divided  among  the 
different  types  of  hair  show  also  that  something  mathematically  equivalent 
to  coupling  takes  place  with  apparent  uniformity.  This  is  the  Mendelian 
law,  and  the  evidence  seems  to  me  sufficient  to  prove  that  something  at 
least  analogous  to  segregation  takes  place.  Whether  the  actual  mechanism 
is  Mendelian  or  not,  it  is  evident  that  any  other  theory  which  seeks  support 
must  lead  to  the  same  numerical  relationship. 

We  now  come  to  the  discussion  of  mixed  and  dark  eyes.  Light  eyes 
have  been  shown  to  fulfil  the  necessary  conditions  for  Mendelian  inherit- 
ance, but  the  other  groups  evidently  have  some  different  significance.  Tliis 
is  best  understood  by  referring  again  to  Expression  (1),  or 


hh 

66 

o 

’ BB 

•2a  b 

BD 

62 

DD 

1 bd 

(2ad  + 26c) 

hd 

hd 

iac  1 

2hd 

DD 

1 BB 

BD 

0 1 

dd 

dd 

•led 

d 

BB 

BD 

DD 

which  is  stable  if  R = -j-  . 

be 
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The  ratios  of  mixed  eyes  to  dark  eyes  in  each  class  of  hair  are  therefore 


or 


or 


2ac  2ad  + 2bc  2hd 
r,  T be  2bc 

“T  —3, 

ad  ad 

cancelling  and  multiplying  by 


2R,  R + 1,  2. 


This  ratio  is  evidently  independent  of  the  relative  proportions  of  the 
different  elements  of  the  population.  If  R = 9,  which  is  the  value  it  ap- 
proximates to  in  the  majority  of  cases,  this  ratio  becomes  9:5:1.  Nine 
districts  in  Scotland  have  values  of  R approximating  to  9 ; they  range  from 
8‘2  to  9'9.  The  relative  proportions  of  mixed  and  of  dark  eyes  are  given 
in  the  following  table  : — 

o 


Table  VI  — Percentage  Mixed  and  Dark  Eyes  associated  with 
EACH  Class  of  Hair. 


Light  Hair. 

Dark  Hair. 

Black 

Hair. 

Eyes. 

Mixed. 

Dark. 

Mixed. 

Dark. 

Mixed. 

Dark. 

Selected  districts 

6-7 

6-7 

73 

11-5 

•74 

314 

All  districts 

6-9 

6-3 

6-3 

12-6 

•82 

2-56 

It  is  a matter  of  observation  that  many  mixed  eyes  are  classed  as  dai’k, 
and  it  seems  reasonable  to  suppose  that  a fixed  proportion  are  so  classed ; 
but  the  figures  given  by  the  selected  districts  cannot  be  adapted  to  the 
ratios  given  above  by  transferring  the  same  proportion  from  each  group  of 
mixed  eyes  to  the  corresponding  group  of  dark  eyes  which  we  have  shown 
takes  place.  The  numbers,  however,  in  the  last  group,  that  of  black  hair, 
are  small,  and  the  error  of  the  ratio,  which  is  approximately  1 in  4, 
may  be  large. 

The  second  group  of  ratios — i.e.  that  for  the  whole  twenty-seven  groups — 
is  more  nearly  in  accord  with  the  supposition  that  a fixed  proportion  of  the 
mixed  eyes  are  called  dark ; but  it  would  seem  probable  that  with  each 
change  of  the  constitution  of  the  gamete  as  regards  hair  colour  a mixed 
eye  tends  to  assume  a darker  hue  to  the  casual  observer,  though  it  may 
well  be  that  this  is  due  as  much  to  the  colour  of  the  eyelashes  as  of  the 
eye  itself.  In  fact,  the  difference  to  be  explained  is  not  so  great  but  that 
it  might  be  accounted  for  on  this  supposition. 
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One  other  point  requires  to  be  considered.  In  this  paper  fair-haired 
and  medium-haired  persons  have  been  classed  together,  and  the  question 
arises  as  to  the  efiect  this  may  have  on  the  relative  proportions  of  mixed 
and  dark  eyes,  as  it  might  well  be  that  a mixed  grey  and  chocolate  eye  and 
a mixed  blue  and  chocolate  eye  would  impress  an  observer  differently. 
Personal  observation  renders  it  probable  that  the  latter  is  more  often 
classed  as  dark,  and  the  figures  bear  out  this  observation.  The  proportions 
are  shown  in  the  accompanying  table 


Table  VII. 


Ratio  of 
Fair  to 
Medium 
Hair. 

Light  Hair. 

Ratio. 

Dark  Hair. 

Ratio. 

Black  Hair. 

Ratio. 

Mixed 

Eyes. 

Dark 

Eyes. 

Mixed 

Eyes. 

Dark 

Eye.s. 

Mixed 

Eyes. 

Dark 

Eyes. 

>1-2 

5'8 

5T 

1-1.3 

5-4 

11-5 

•47 

•95 

2-60 

■37 

>l<l-2 

6-9 

6-5 

1-06 

5-6 

11-2 

■50 

•84 

2-24 

•37 

<1 

6-2 

6-8 

•91 

5-7 

11-3 

■50 

•74 

2-91 

•25 

It  is  to  be  noted  that  the  ratio  of  mixed  to  dark  eyes  tends  in  the 
groups  of  light  hair  and  black  hair  to  decrease  with  the  decrease  of  light 
hair  and  to  remain  constant  in  the  group  of  dark  hair.  From  such  facts 
no  certain  inferences  can  be  drawn,  but  the  suggestion  is  that  a mixed  blue 
and  chocolate  eye  is  somewhat  darker  on  the  average  than  a mixed  grey 
and  chocolate  eye. 


COXCLUSIONS. 

(1)  Many  of  Dr  Beddoe’s  populations  are  stable  in  a Mendelian  sense. 
Though  this  does  not  necessarily  imply  that  the  theory  as  stated  by  Mendel 
is  the  only  explanation  of  the  arithmetical  proportions  found,  any  other 
theoiy  claiming  to  explain  the  facts  of  heredity  must  also  explain  these 
relative  proportions. 

(2)  That  linkage  between  hair  colour  and  eye  colour  exists.  The 
coupling  factor  is  more  likely  to  be  9 than  7,  and  therefore  does  not  agree 
with  the  present  Mendelian  theory.  It  is  quite  possibly  to  be  explained 
on  the  analogy  of  chemical  equilibrium. 

(3)  That  it  is  possible  that  the  colour  of  the  hair  has,  in  addition  to  this, 
some  other  efiect  in  altering  the  colour  of  the  eyes ; but  the  evidence  is  not 
sufficient  to  prove  this,  and  it  may  be  only  due  to  the  fact  that  dark  eye- 
lashes tend  to  lend  a darker  appearance  to  eyes  than  would  be  found 
justified  on  a more  careful  examination. 
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(4)  A further  result  of  the  analysis  made  in  this  paper  is  that  Dr 
Beddoe’s  figures  give  no  suggestion  of  the  presence  of  any  race  in  this 
country  which  had  different  hair  and  eye  relationships  from  those  pertain- 
ing to  the  three  races  generally  considered  to  form  the  basis  of  the 
European  population.  This,  of  course,  does  not  exclude  the  possibility  of 
an  older  race  surviving  in  sufficient  numbers  to  form  a considerable  part  of 
the  British  population ; but,  so  far  as  the  survey  is  valid,  this  I’ace  must 
have  had  a hair  and  eye  complex  closely  allied  to  one  or  other  of  the  hair 
and  eye  complexes  considered  in  this  paper. 


APPENDIX. 

On  the  Categories  of  Eye  Colour,  with  a Record  op  One  Observation. 

Eye  colour  is  the  subject  of  much  controversy.  I am  personally  of  the  opinion  that 
all  categories  that  have  been  described  are  very  imperfect.  In  the  first  place,  apart 
from  actual  colour,  the  pigment  of  the  posterior  layer  of  the  iris  may  be  seen  at 
times  with  more  or  less  prominence  along  the  inner  and  outer  edges  of  the  iris,  often 
causing  the  eye  to  appear  darker  than  the  colour  alone  would  permit. 

Again,  mixed  eyes  are  of  two  kinds — those  in  which  the  pigment  is  (1)  diffuse  and 

(2)  discrete,  that  is,  in  spots  ; but  as  far  as  my  observations  go,  I have  never  seen  pig- 
ment in  the  eyes  of  children  which  was  not  present  in  the  eyes  of  one  or  other  of  the 
parents.  In  mixed  eyes  the  pigment  tends  to  collect  more  markedly  near  the  inner 
edge  of  the  iris,  so  that  in  a mixed  chocolate  and  grey  eye  we  may  have  both  the 
chocolate  and  the  grey  pigment  in  the  inner  part,  and  the  outer  edge  simulating  a 
blue  eye. 

Of  actual  types  of  pure  as  distinct  from  mixed  eyes  I recognise  four : — 

(1)  The  pure  blue  eye,  in  which  there  is  no  pigment  in  the  iris,  such  grey 

as  appears  being  due  to  strands  of  connective  tissue. 

(2)  The  grey  or  pale  yellow,  in  which  there  is  always  visible  pigment  present 

in  little  masses,  quite  distinct  from  definite  strands  of  connective  tissue. 

(3)  The  deep  yellow  eye,  a more  or  less  rare  form,  not  much  exceeding  1 per 

cent,  of  the  adult  population  as  seen  in  Glasgow. 

(4)  The  dark-brown  or  chocolate  eye,  of  which  the  shades  vary,  but  in  all  of 

which  the  iris  is  sensibly  the  same  colour  from  the  inner  margin  to  the 

outer. 

All  these  types  of  eyes  may  be  found  mixed,  and  as  regards  eyes  the  population 
may  be  taken  as  given  by 

in^{a,  a)  + rfi{b,  b)  +p^{c,  c)  -P  q~{d,  d)  -P  2mn{a,  b)  + 2mp{a,  c) 

-P  2mq{a,  d)2np{b,  c)  -P  2nq(b,  d)  -P  ‘lpq{c,  d). 
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Now,  some  of  these  types  are  very  difficult  to  distinguish,  especially  in  children. 
Of  the  varieties  which  are  very  difficult  to  distinguish  are  : (1)  the  mixture  of  yellow 
and  grey  from  the  mixture  of  chocolate  and  grey,  a small  amount  of  chocolate  pig- 
ment being  not  unlike  yellow  ; and  (2)  the  mixture  of  chocolate  and  grey  from  the 
mixture  of  chocolate  and  blue,  the  connective  tissue  of  the  latter  simulating  grey 
pigment  when  masked  by  a veil  of  chocolate  pigment. 

Last  summer  I examined  a school  of  nearly  one  hundred  children  in  Skye,  a 
school  where  the  population  may  he  considered  free-mating  and  uncontaminated  by 
immigration.  As  each  child  was  shown  to  me  I stated  to  an  amanuensis  my  decision 
concerning  the  eye  colour,  and  the  numbers  are  as  follows  ; — 


Class  1.  Pure  blue  . . . . . 12 

.,  2.  Pure  grey  ...  . . 9 

,,  .3.  Dark  yellow  ......  1 

,,  4.  Chocolate  ......  4 

,,  5.  Mixed  blue  and  grey  . . .23 

,,  6.  Blue  and  yellow  .....  2 

,,  7.  Blue  and  chocolate  .....  1 

,,  8.  Grey  and  yellow  . . . .18 

,,  9.  Grey  and  chocolate . . . .18 

,,  10.  Yellow  and  chocolate  ....  3 


The  difficulties  above  mentioned  show  themselves  at  once ; but  if  classes  3,  4 
and  10  he  combined,  and  if  classes  6,  7,  8,  and  9 be  also  combined,  we  have  the 
following  figures  : — 


Actual 

Figures. 

Theoretical  * 
Proportions. 

Pure  blue  .... 

12 

12-39 

i\Iixed  grey  and  blue  . 

23 

21 -.54 

Pure  grey  .... 

9 

9-3G 

Mixed  blue  or  grey  and 
chocolate  or  yellow 

39 

38-95 

Chocolate  and  yellow . 

8 

8-76 

Pure  and  mixed  . 

91 

91-00 

These  results  are  too  close  to  be  wholly  chance,  but  as  it  is  a solitary  instance 
they  are  advanced  with  diffidence.  They  are,  however,  in  complete  accordance  with 
those  given  in  the  preceding  notes  on  “ Inheritance  of  Hair  and  Eye  Colour.” 


* Fitted  by  the  method  of  least  squares. 
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The  Theory  of  Probable  Error  and  its  Application  to  Vital  Statistics, 
by  John  Brownlee,  M.D.,  D.Sc.,  Physician  Superintendent,  City 
of  Glasgow  Fever  Hospital,  Ruchill. 


WITH  the  increase  of  the  use  of  statistics  in  public  health  it  is 
becoming  increasingly  important  that  an  accurate  knowledge  of 
the  processes  by  which  results  are  arrived  at  should  be  in  the  hands  of  all 
w'orking  with  figures.  The  theory  of  error  was  originally  developed  in 
connection  with  games  of  chance,  further  developed  to  suit  the  require- 
ments of  astronomy,  and  contemporaneously  applied  from  a different  point 
of  view  to  the  construction  ’ of  life  tables.  In  recent  years  these  two 
applications  have  converged,  till  it  is  now  possible  to  apply  many  results 
deduced  from  the  theory  of  chance  to  the  discussion  of  problems  which 
could  formerly  only  be  attacked  by  the  method  of  finite  differences. 

2.  Modern  mathematical  analysis  has  developed  very  specially  three 
branches.  It  has  greatly  extended  the  application  of  the  metliod  of  curve- 
fitting to  smooth  observations.  It  has  brought  into  use  a large  number  of 
methods  for  calculating  the  correlation  between  different  qualities.  It 
has  also  concerned  itself  largely  with  the  discussion  of  probable  error.  It 
is  this  last  branch  I intend  to  treat  chiefly  to-day. 

3.  This  subject  falls  naturally  into  four  divisions  : 

I.  The  error  due  to  random  selection ; 

II.  The  assumptions  on  wdiich  the  mathematical  proofs  are 
based  and  the  modifications  required  ; 

III.  The  influence  of  experimental  error;  and 

IV.  The  method  of  testing  how  far  theory  and  observation 
agree. 


I. 

4.  The  subject  of  probable  error  due  to  random  sampling  is  as  a rule 
dismissed  in  public  health  text  books  with  a simple  statement  of  Poisson’s 
Formula,  or  with  a treatment  w’hich  almost  wholly  neglects  the  limitations 
of  its  application.  The  actual  mathematics,  however,  required  for  its  under- 
standing is  not  very  advanced.  The  general  theorem  which  is  of  most 
importance  can  be  found  proved  in  any  elementary  text  book  of  Algebi’a, 
and  is  as  follows.  If  p be  the  chance  of  an  event  happening  and 
that  of  it  failing  to  happen  so  that  (/;  -f-  y)  = l,  that  is,  either  the  event 
happens  or  it  fails,  then  in  n trials  the  chance  of  its  happening  — )n) 
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times  and  falling  m times  is  given  by  the  term  of  the  binomial 

expansion 

of  {p  + qY 

or  of  + np^-'^q  + 6)-2) 


that  is 


1.2 

n{n  — l) 


1.2.3 
(n  — m 4-  1 ) 

' ' --n — • 


qp-o  q3 


in 


1.2  ...  . rn 

li  p = q this  expression  is  symmetrical,  and  the  chances  of  the  event 
happening  m times  is  the  same  as  that  of  it  failing  m times,  the  formula 
in  this  case  becoming 

(2  + 2)“ 

It  is  to  be  noted  that  as  p 9 is  equal  to  unity  {p  + qY  is  also  equal  to 
unity,  and  that  if  we  have  31  cases  the  distribution  is  given  by  3I(p  + qY- 
5.  Many  distributions  are  described  very  approximately  by  one  or 
other  of  these  formulie.  Thus  stature,  head  breadth,  head  length, 
cephalic  Index,  etc.,  are  very  closely  represented  by 

(a  + 2)" 

while  such  cases  as  the  number  of  persons  suffering  from  enteric  fever 
at  each  age  period,  etc  , are  described  by  the  formula  {p  -|-  qY-  But 
these  distributions  are  not  as  a rule  used  in  the  forms  above  ffiven. 
Certain  curves  which  can  be  calculated  much  more  simply  have  been 
found  to  represent  these  formulae  very  closely. 

Thus  is  represented 

X- 

byy  = 3/je  ^ , commonly  called  the  “Normal  Curve 

of  Error,” 

and  {p  -|-  qY  hy  y — y e known  to  statisticians  as  Type  III. 


6.  The  method  in  which  the  form  arises  is  of  special  interest. 

It  is  commonly  derived  from  the  analogy  of  coin  tossing.  Only  heads  or 
tails  can  occur,  and  the  chance  of  either  is  equal.  Thus,  if  we  toss  a 
single  coin  a large  number  of  times,  in  the  end  approximately  equal 
proportions  of  heads  and  tails  will  ensue.  If  we  toss  two  coins  together  a 
large  number  of  times,  two  heads  or  two  tails  will  each  occur  once,  and  a 
head  and  a tail  twice,  approximately,  out  of  every  four  times  the  coins  are 
spun.  If  n coins  be  spun  the  chance  of  each  combination  of  heads  and 
tails  is  given  by  the  terms  of  the  binomial  expression 

(2  + i)". 

It  is  to  be  noted  here  that  the  chances  are  quite  independent  of  each 
other,  as  a head  or  a tail  is  equally  probable  at  each  separate  experiment. 
If  a head  in  excess  denote  a positive  error,  and  a tail  a negative,  we  find 
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that  not  only  are  the  errors  independent,  but  positive  and  negative  errors 
of  like  size  occur  with  equal  frequency.  But  there  is  no  necessity  iu 
nature  for  the  odds  to  be  equal  on  both  sides.  If  we  take  a six-sided 
die,  for  instance,  six  can  only  be  thrown  once  on  the  average  for  five 
times  the  other  numbers  are  thrown.  If  we  take  n dice,  then  the 
proportion  in  which  the  sixes  will  turn  up  are  given  by  the  terms  of  the 
expression 

n sixes  turning  up  only  once  in  6"  times. 

7.  Cei'tain  quantities  are  specially  important.  The  mean  of  the  obser- 
vations is  one  of  these,  this  being  regularly  used  in  all  statistical  work  for 
purposes  of  comparison.  The  next  most  important  is  the  standard  devia- 
tion, which  is  the  square  root  of  the  second  moment  taken  round  a vertical 
line  through  the  mean,  and  which  is  equivalent  in  dynamics  to  the  radius 
of  gyration. 

The  mean  may  be  defined  as  the  average  value  of  the  quantities  con- 
sidered. It  is  obtained  by  multiplying  the  size  of  each  unit  by  the 
number  of  times  it  occurs,  taking  the  sum  of  all  such  values  and  dividin 
this  sum  by  the  total  number  of  units  considered.  Thus,  if  the  size 
occurs  VI  times,  and  the  size  b,  n times,  the  mean  is  given  by 

ma  -)-  nb 

Hi  — |—  It 

If  more  sizes  exist,  and  the  sum  be  denoted  by  2 as  usual,  then  the  mean 

. . , 2 ma 

IS  given  by  

2 rn 

8.  In  the  case  of  the  mean  can  readily  be  found.  Suppose 

the  expression  expanded  as  before,  and  suppose  that  the  frequency  value  ]i^ 
corresponds  to  the  value  of  the  size  h,  andp"“iry  to  the  value  (h-\-a),  et(^, 
where  a is  the  increase  of  value  in  passing  from  one  term  to  the  next, 
then  we  have  at  once,  as  corresponding  to  the  expression  2'«a, 

p’^  h -j-  q(h  -|-  «)  + p"  — ^ q'^  (A  -f-  2a)  -j-  . . . . 

X 

which  equals 

A + np^~^q  A -f-  p’'”^  q^  h . 

-j-  nap  ""By  -|-  n(n  - l)ap"“2^2  ^ ap°-~^  q^  . 


— Hp  + y)"  + ’*“7  (p  + iT  ’ 

Mean  = P + V)"  + + <]T~^ 

(B  + q)" 

— li  -\-  H aq  since  {p  -f-  <])  = 1. 
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The  mean  may  obviously  be  calculated  as  a distance  from  any  origin  ; 
it  is  usual,  however,  in  practice,  to  calculate  it  from  some  point  in  the 
middle  of  the  series  of  observations,  as  will  be  presently  shown. 

9.  In  a similar  way  the  second  moment  is  calculated.  This  is  usually 

denoted  by  /x^.  In  this  case  we  multiply  the  terms  by  {h  -}-  a)'^,  .... 
instead  oi  hj  b,  h a This  gives  for  the  separate  terms 

l^2(P  “1"  ~ 4“  np^^'^q(h  + a)'^  \ 2 — ^ -J-  2a)2  . 

-1-  2ahiq{p  + 

■fa^nq  -|”  (n  — l)2qi^~‘^q  -j-  ^ qP~^q^  .... 

As  the  last  expression  is  equal  to 
a^rq  {p  qY~^ 

-f-  a^n{n — I )q"^{p  + 

/xg  = -f-  2ahn  q -|-  a“^nq  a^ii{n — 1)^^ 

This  is  the  second  moment  taken  about  a vertical  line  at  distance  h -\-naq 
from  the  centre  of  gravity. 

10.  Supposing  now  that  the  origin  is  at  the  centre  of  gravity  instead 
of  the  position  formerly  assumed,  it  follows  that  A -)-  naq  = 0.  If  we 
substitute  then  A = — naq  in  the  formula  for  the  second  moment,  we 
have  as  the  value  of  that  moment  round  a vertical  line  through  the 
centre  of  gravity  or  the  mean 

A^  -(-  2ahnq  -f-  a'^nq  -j-  a^n^q^  — a^nq^  when  A=  — anq 
= a'^vq  — a^nq'^ 

= a^)iq{l  — q) 

= aPrpq 

The  standard  deviation,  usually  denoted  by  cr,  is  equal  to  the  square  root 
of  this,  and  is  therefore  o\/ npq,  or  \/ npq  if  a be  taken  as  unity,  as  is 
usually  done.  In  general  to  calculate  the  second  moment  round  the 
ordinate  through  the  centre  of  gravity,  which  for  shortness  is  called 
“centroid  verticle,”  the  distance  of  the  mean  from  some  suitable  oricrin 
and  the  second  moment  round  the  same  origin  are  calculated.  If  these 
arc  denoted  by  and  respectively,  then  o-2  = — vf,  which  is 

easily  seen  to  be  the  case  by  a modification  of  the  proof  given  above, 
for  if  the  last  formula  hold 

0-2  = A2  -|-  2aJuiq  -|-  a^nq~  -|-  a^n{n — 1)^^  — (^4  _j_  anqY 
= a^nq{l  — q) 

= a~npq,  as  already  found. 

11.  As  an  example,  take  the  number  of  deaths  in  each  series  of 
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one  hundred  cases 

of  scarlet  fever. 

Here  out  of 

thirty 

instances  the 

deaths  ranged  from 

0 to  6,  as  seen  below. 

No.  of  Deaths.  No.  of  Instances. 

Multipliers. 

0 

1 

-3 

-3 

9 

1 

6 

— 2 

— 12 

24 

2 

6 

-1 

— () 

6 

3 

9 

0 

— 21 

39 

4 

4 

1 

4 

4 

5 

3 

2 

6 

12 

6 

1 

3 

3 

9 

30 

13 

25 

— 21 

39 

— 8 

1 

1 

1 

The  orifrin  has  been  taken  at  3 deaths  and  the  abscissae  measured 
positively  and  negatively  from  this  point.  The  products  for  the  first 
and  second  moments  are  then  found  and  added  together,  having  regard 
to  sign.  So  that  we  have 


2 ~ 


So  that  o-^  = 


= 2-062 
or  o-  = 1-436 

g 

Since  V ^ the  mean  number  of  deaths  in  each  hundred  cases 

is  equal  to  3 — gf)  ~ 2-73,  since  3 deaths  has  been  chosen  as  the  point 


of  origin.  This  is  in  general  much  the  simplest  way  of  calculating 
the  mean  and  the  standard  deviation. 

12.  The  significance  of  the  standard  deviation  can  be  best  seen 
when  two  normal  curv-es  of  equal  area  are  compared.  This  is  shown 
in  the  diagram.  Both  these  curves  relate  to  the  same  number  of  cases,  N. 
The  equation  of  the  first  is 


A” 


y = 


2 


2/  = 


2y/  27t 


,?■- 


and  of  the  second, 
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The  standard  deviation  of  the  first  is  unity,  and  of  the  second  two. 
It  is  seen  at  once  that  a much  greater  variation  of  values  takes  place  in 
the  second  than  in  the  first,  or  that  a very  much  smaller  proportion  of 
cases  having  the  mean  properties  is  found.  In  other  words,  the  larger  the 
standard  deviation  the  less  likely  it  is  that  the  mean  value  obta.ined  from 
the  observations  represents  a large  proportion  of  the  values. 


Diagram  illustrating  the  meaning  of  probable  error.  The  two  curves 
shown  have  the  same  area,  but  the  standard  deviation  of  the  lower  is  twice 
that  of  the  upper.  The  continuous  vertical  lines  divide  the  upper  curve 
into  parts  so  that  the  centre  area  is  equal  to  the  sum  of  the  two  external 
portions,  and  the  chain  lines  do  the  same  for  the  flatter  curve,  showing 
that  the  greater  the  standard  deviation  the  greater  the  probable  error. 

13.  The  definition  of  the  term  “probable  error”  can  now  be  given. 
It  has  been  determined  by  this  use  of  the  “ normal  ” curve  to  describe  the 
variations  due  to  error  in  observation.  If  we  divide  the  area  of  the  curve 
into  three  portions,  viz.,  one  limited  by  two  ordinates  each  equidistant  from 
the  middle  line,  and  two  portions  external  to  these  ordinates ; so  that  the 
area  of  the  central  portion  is  equal  to  twice  the  area  of  either  of  the 
external  portions,  it  can  be  calculated  that  the  distance  of  the  ordinates 
from  the  middle  line  is  ’67449  o-.  This  is  termed  the  probable  error,  and 
it  signifies  that  the  chances  of  an  observation  falling  into  the  central  portion 
or  into  one  or  other  of  the  external  portions  are  equal. 

14.  When  the  curve  is  asymmetrical,  that  is,  when  it  is  derived  from 
(p + '?)")  where  is  not  equal  to  q,  the  standard  deviation  still  has  a 
significance  as  indicating  the  degree  of  “scatter,’^  but  it  can  no  lonijer  be 
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u«ed  to  measure  the  deviation  on  both  sides  of  the  mean.  The  mode  or 
most  probable  value  is  now  no  longer  coincident  with  the  mean,  but  lies 
more  or  less  to  one  side  of  it. 

15.  The  probable  error  has  in  itself  little  pi’actical  use,  since  no  infer- 
ence can  be  drawn  where  the  odds  are  equal.  The  common  rule  is  to  take 
three  times  the  probable  error  as  indicating  the  point  at  which  a conclusion 
may  be  taken  as  fairly  probable,  but  it  is  better  to  avoid  using  the  term 
“ probable  error  ” and  consider  only  the  standard  deviation,  which,  as  twice 
the  standard  deviation  is  almost  exactly  equal  to  three  times  the  probable 
error,  occasions  no  change  of  argument,  and  only  a small  change  of 
nomenclature.  In  this  connection  the  standard  deviation  may  well  be 
called  the  “ standard  error,”  as  is  done  by  Mr.  Yule.  In  the  accompanying 
table  (Table  I.)  ax’e  shown  the  chances  of  the  observation  lying  within  the 
area  of  the  curve  limited  by  distances  fi’om  the  mid-line  +‘5o-,  +o-,  +l'5cr, 
etc.,  and  the  approximate  values  of  the  chance  of  failure  in  fractions.  If 
the  standard  deviation  itself  is  used  the  odds  are  two  to  one  in  favour  of 
the  actual  figures  lying  within  the  area  bounded  by  y = ±ct'  ; if  twice  the 
standard  deviation  be  taken,  the  usual  limit,  these  rise  to  21  to  1,  while 
if  three  times  the  standard  deviation  be  used  they  rise  to  369  to  1.  Even 
in  the  latter  instance,  however,  the  odds  must  not  be  considered  over- 
whelming. 

Table  I. — Shoiving  the  Chances  that  the  Actual  Observation  lies  loithm 

y = + n<T. 


n 

Chances  ot  success. 

Approximate  chances 
of  failure. 

•5 

•3829 

2 

7 

10 

•68J7 

1 

7 

1-5 

•8664 

1 

T 

20 

•9.546 

1 

■2^ 

2-5 

•9876 

1 

R 1 

30 

•9973 

1 

TT7  ff 

16.  Before  making  application  of  what  has  been  said,  it  will  be  w'ell  to 
observe  more  particularly  wdiat  happens  when  samples  of  a population  are 
drawn:  (1)  theoretically,  and  (2)  in  actual  instances. 

The  theory  of  chance  gives  us  two  formulae.  If  we  draw  M samples 
of,  say,  r individuals  from  the  very  large  population,  the  proportion  p of  which 
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consist  of  one  class,  and  the  proportion  q of  the  remainder,  then  the 
numbers  in  which  the  different  proportions  of  p and  q occur  are  given 
hy  the  several  terms  of  the  expression, 

M ip  -f  qY 

If,  however,  the  number  from  which  samples  may  be  drawn  is  limited 
say  to  n,  p and  q remaining  as  before,  the  proportions  of  the  different 
populations  are  given  by  the  terms  of 

r qn  r(r — 1)  qn{qn  — 1) 

1 — r -f-  1 1 ( pn  — r -f-  1)  ( pn  — r -f-  2)  ‘ * 

17.  These  are  abstract  problems,  but  such  problems  appear  regularly 
in  public  health  statistics.  I have  had  a series  of  these  investigated. 
In  the  case  of  scarlet  fever,  Belvidere  Hospital  (1900-1912);  scarlet 
fever,  Ruchill  Hospital  (1909-1912);  enteric  fever,  Belvidere  and 
Kuchill  Hospitals  (1900-1912);  and  diphtheria,  Enchill  Hospital,  (1909- 
1912),  all  the  cases  have  been  tabulated  in  consecutive  groups  of  100 
to  200  cases  (Table  II.).  In  each  of  these  groups  the  even  numbers 


Table  II. — Showing  the  Number  of  Deaths  in  Parallel  Series  of  Cases 
Chosen  so  that  the  Alternate  Cases  fall  into  Different  Groups. 


Scarlet  Fever, ^ 
Belvidere,  11)00—1908. 
d,70o  cases. 

Enteric  Fever,* 
Belvidere  and  Ruchill, 
IIH'O— 11)12, 

1,1)0U  cases. 

Diph  heria,t 
Buchill.  11)09— 1012, 
1,800  cases. 

Sc<»rlet  Fever,t 
Ruchill,  1909—1912. 
2,000  cases. 

1 

2 

1 

12 

8 

10 

13 

4 

3 

2 

_ 

1 

2 

11 

6 

7 

7 

5 

3 

1 

3 

3 

2 

ID 

10 

6 

9 

1 

2 

1 

3 

3 

- 

10 

6 

10 

4 

2 

3 

3 

2 

6 

11 

14 

13 

10 

3 

4 

3 

_ 

2 

2 

14 

6 

8 

12 

1 

3 

3 

1 

- 

1 

9 

9 

14 

11 

6 

5 

1 

_ 

2 

4 

6 

5 

9 

9 

2 

4 

2 

1 

1 

4 

15 

5 

11 

12 

H 

5 

4 

_ 

3 

4 

6 

2 

1 

1 

3 

3 

2 

9 

10 

1 

- 

3 

2 

2 

3 

10 

6 

4 

2 

1 

4 

_ 

1 

9 

19 

3 

3 

4 

1 

2 

2 

10 

14 

3 

2 

3 

2 

- 

1 

10 

15 

5 

1 

2 

7 

1 

1 

10 

7 

3 

2 

2 

1 

10 

5 

3 

3 

3 

1 

8 

8 

1 

4 

4 

5 

7 

7 

3 

3 

7 

4 

6 

7 

1 

3 

2 

1 

2 

1 

2 

1 

3 

2 

1 

2 

3 

4 

* E ich  llgure  ileaotes  the  number  of  deaths  in  50  cases, 
t Each  flgure  denotes  the  number  of  deaths  iu  100  cases. 
I Each  80  cases. 
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of  the  hospital  register  have  been  kept  separate  from  the  odd  nnmbers. 
It  is  thus  possible  to  compare  the  mortalities  of  groups  of  cases  admitted 
at  the  same  time  and  selected  from  each  other  only  after  an  interval  of 
years  by  a method  as  absolutely  fair  as  seems  possible. 

18.  Take  as  a first  instance  the  drawing  of  samples  of  50  from  a 
general  population  which  is  divided  into  two  portions  in  the  proportions 
of  18  to  82.  The  result  of  M trials  is  given  by  the  terms  of  the  expansion 


M 


/82  J8.V” 

VI 00  + 100/ 


Or  M 


(/82\5o  .^/82\49  18  50-49/82  \48/ 18 

iVluO/  +®^\10U/  100“*"  1-2  UOO/  VlOO/  + 


This  expansion  includes  the  case  of  the  groups  of  enteric  fever  where 
the  mortality  has  been  on  the  average  18  per  cent.  Thirty-eight  groups 
of  fifties  occur.  The  numbers  of  these  groups  with  each  definite  number 


Table  III. — Shoiving  the  Actual  Numher  of  Groups  of  50  Cases  of  Enteric 
Fever  ( Betvidere  and  RucliiU  Hospitals)  which  coidain  a Definite 
Number  of  Deaths  compared  loith  Expectation. 


Number  of  Deaths. 

Number  of  Groups  with 
X Deuihs. 

Number  of  Groups  Expected 
Theoroticttily. 

0 

1 

- 

2 

- 

•1 

3 

•4 

4 

1 

10 

5 

4 

21 

6 

5 

3.5 

7 

3 

4-5 

8 

4 

5-4 

9 

3 

5:> 

10 

9 

50 

11 

2 

40 

12 

1 

2-8 

13 

1'8 

14 

3 

10 

15 

2 

.5 

irt 

•- 

•26 

17 

•- 

■12 

18 

•- 

■05 

19 

1 

■(i2 

20 

1 

■01 

21 

1 

■00 

of  deaths  are  given  in  the  adjoining  table  and  compared  with  the 
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numbers  obtained  from  the  expression  given  above.  The  fit  is  not  good, 
but  if  they  be  grouped  in  larger  classes,  namely  0-3  deaths,  4-7  deaths, 
etc.,  the  theoretical  and  the  actual  numbers  show  a good  correspondence. 
(Table  IV.) 

Table  TV. 


No.  of  l^eaths. 

Actual. 

TheoTctical. 

0-3 

0 

■6 

4-7 

13 

111 

8-11 

18 

19  9 

12-15 

6 

61 

16-19 

1 

■47 

38 

3807 

19.  It  is  not  at  all  clear,  however,  that  we  should  draw  from  an 
infinite  class.  The  consecutive  cases  of  scarlet  fever  in  Ruchill  number 
2,960  with  82  deaths.  Taking  these  numbers  the  relative  proportions  of 
samples  of  100  of  different  constitution  can  easily  be  calculated  (par.  16). 
These  numbers  are  shown  below  and  compared  with  those  acutually 
found.  (Table  V.) 


Table  V. — Scarlet  Fever,  Ruclnll  Iloitpital. 


Number  of  Deaths. 

Number  Expected. 

Number  Found. 

DiOerenc". 

DifTorern^e  Squared 
L»ivided  by  Theo- 
retical isumbera. 

0 

1-83 

1 

•83 

•38 

1 

6-34 

6 

•06 

•08 

2 

7'4l 

6 

1-41 

•27 

3 

6-85 

9 

215 

•70 

4 

4-66 

4 

*Grt 

•10 

5 

2'47 

3 

■63 

113 

6 

106 

1 

•06 

•00 

7 

•38 

— 

— 

— 

8 

•11 

— 

— 

— 

3011 

30-00 

— 

264 

It  is  a vei’y  good  fit  considering  the  small  number  of  the  observations. 
A similar  table  is  given  for  Belvidere.  Here  in  the  earlier  period  the 
mortality  was  much  higher,  203  deaths  taking  place  in  4,700  cases.  The 
variation  in  groups  of  50  cases  is  considered.  The  actual  and  theoretical 
figures  are  given  in  Table  VI. 
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Table  VI. — Scarlet  Fever,  Belvidere  Iloftpital. 


Number  of  Deaths. 

Theoretical. 

Actual 

Difference. 

Difference  Squared 
Divided  by 

Theoretical  Numbers. 

0 

10-4 

10 

•4 

•00 

1 

23-4 

25 

1-6 

•11 

2 

268 

23 

2-8 

•30 

3 

18-7 

22 

3-3 

•59 

4 

98 

9 

•8 

•07 

6 

4-0 

2 

2'0 

1-00 

6 

7 

1-3)  1 c 
•4; 

1-3 

I'OO 

93-8 

94 

3 07 

These  figures  show  a very  fair  correspondence  between  fact  and  theory. 
Possibly  the  fit  might  be  better  or  might  be  worse  with  larger  numbers, 
for  in  the  figures  as  given  there  is  a correlation  between  high  numbers  of 
deaths  or  low  numhers  of  deaths  in  the  corresponding  fifties  or  hundreds 
to  be  expected  as  the  fevers  vary  somewhat  in  severity  from  period  to 
period,  but  the  numbers  are  not  sufficient  to  determine  the  amount 
definitely. 

20.  We  are  now  in  a position  to  explain  the  proof  of  the  chief  theorem 
in  probable  error  as  applied  to  vital  statistics.  The  problem  is  : if  we  have 
a population  of  AMndividuals  consisting  of  s groups  y-,,  to  find 

the  standard  deviation  of  the  group  yp.  The  chance  of  one  individual  being 

drawn  from  this  group  is  evidently  and  likewise  the  chance  of  his  not 


being  drawn  is 


1 


N 


If  then  m individuals  have  been  selected  by 


chance  the  proportional  distributions  will  be  represented,  as  has  been  seen 
before,  by  the  terms  of  the  expansion  of  ] \ ~ W/  1 standard 


deviation  of  which 


is 


Now  we  do  not  know  the 


ratio  of  y^  to  N.  All  that  is  known  is  the  ratio  which  the  samples  of 
these  quantities  bear  to  each  other.  We  may,  however,  assume,  subject  to 
subsequent  investigation,  that  these  ratios  are  for  practical  purposes 
identical,  keeping  in  mind  that  this  at  present  is  only  an  assumption.  If 
then  y'p  denotes  the  actual  number  of  yp  found,  we  have  the  standard  devi- 
ation of  the  error  of  yp  since  the  total  number  of  observations  is  m,  repre- 


sented by 

m \ m 


or  suppressing 


tlie  accents  by 

m 


12  Theory  of  Prohahlc  Error,  its  Application  to  Vital  Statistics. 


21.  To  make  clearer  the  meaning  of  the  formula  just  found  it  is  applied 
to  the  example  given  before  (par.  19)  regarding  the  actual  groups  found 
of  4,700  cases  of  scarlet  fever.  The  mean  death-rate  and  the  standard 
deviation  of  these,  calculated  as  in  the  example  (par.  11)  are  found  to  be 
M = 2T6  and  <r  — l'47o.  With  the  formula  just  given 

^ _ \/2'Hi  (50  — 2' 10) 

50 

= 1-437 

It  must  be  noted  that  the  grouping  is  quite  asymmetric,  nearly  twice  as 
many  cases  occurring  on  the  one  side  of  the  mean  as  on  the  other,  so  that 
a smaller  value  of  the  death-rate  than  that  given  by  the  mean  is  twice  as 
probable  as  one  that  is  larger.  A second  application  is  made  to  deaths  in 
each  series  of  100  cases  of  scarlet  fever  seen  in  Ruchill  Hospital.  Here, 
as  we  have  already  seen  (par.  11)  o-  = 1-436,  the  average  number  of 
deaths  per  hundred  is  2-73,  so  that  we  hav'e  by  the  formula  of  standard 
error 

/ 2-73  (100  - 2-73) 

100 

= V"  2-658 
= 1-630 

or  in  this  case  the  actual  range  found  is  considerably  less  than  that 
expected  theoretically. 

22.  The  method  in  which  the  standard  error  varies  can  best  be  observed 
by  considering  actual  figures.  In  the  two  next  tables  (Tables  VII.-VIII.) 
two  sets  of  values  of  the  standard  error  are  siven.  The  first  values  jilven  are 
the  absolute  values.  Thus,  if  from  the  column  showing  the  number  of 
cases  5,000  is  chosen,  the  cori-esponding  value  of  the  standard  error  when 
the  death-rate  is  5 per  cent,  is  seen  to  be  15"4:11 ; a 5 per  cent,  mortality 
in  5,000  cases  means  250  deaths.  Twice  the  standard  error  is  30'8,  so 


Table  VIL — Shoioing  the  Value  of  the  Standard  Error  of  the  Number  of 
Deaths  for  Different  Percentage  Death-Rates  when  the  N^amber 
of  Cases  Increases. 


Peicentage  Mortality, 


No.  of 


Cases. 

1 p.c. 

2 p.c. 

3 p.c. 

4 p.c. 

5 p.c. 

10  p c. 

20  p.c. 

30  p.c. 

40  p.c. 

60  p.c. 

100 

-995 

1-400 

1-706 

1-960 

2-179 

3-000 

4 000 

4-583 

4-899 

6-000 

500 

2-225 

3-131 

3-815 

4-3fS2 

4-874 

6-708 

8 944 

10-247 

11045 

11-180 

1,000 

3-146 

4427 

6-394 

6-197 

6-892 

9-487 

12619 

14-491 

15-492 

15  811 

6,000 

7-036 

9-900 

12-063 

13-686 

15-411 

21-331 

28-284 

32-404 

34-641 

35-355 

10,000 

9 95 

14-00 

17-06 

19-60 

21-79 

30  000 

40-000 

45  83 

48-99 

50-00 
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Table  VIII. — Slwioing  the  Value  of  the  Standard  Error  oi  the  Percentage 
Death-Rate  udien  tlie  Nninher  of  Cases  Increases. 


Perceniage  Mortality. 


No.  of 


CuBes. 

1 pc. 

2 p.c. 

.-i  p.c. 

4 p c. 

S p.c. 

10  p.c. 

20  p.c. 

30  p.c. 

40  p.c. 

50  p.c. 

100 

•995 

1-400 

1706 

1-960 

2-179 

3-000 

4 000 

4-583 

4-899 

5-000 

600 

•446 

•626 

■763 

•872 

•975 

1-341 

1-788 

2-050 

2-209 

2-236 

1,000 

•315 

•443 

•539 

•620 

•689 

■949 

1-265 

1-449 

1549 

1-581 

5,000 

•121 

•198 

•211 

•271 

•308 

•4-26 

•666 

■648 

•693 

•707 

10,0U0 

•100 

•140 

•171 

•196 

•218 

•300 

•400 

■468 

•490 

■600 

that  the  odds  are  21  to  1 that  the  real  number  of  deaths  lies  between  220 
and  281.  The  figure  in  the  same  place  in  the  second  table  is  ‘308.  This  is 
the  standard  error  of  the  percentage  death-rate,  or  the  odds  are  again 

21  to  1 that  the  true  percentage  death-rate  lies  between  4'382  and  ydilfi 
if  the  death-rate  is  based  on  5,000  cases. 

23.  Several  facts  are  easily  seen  in  considering  these  tables.  First 
the  standard  error  increases  with  the  increasing  percentage  mortality, 
rising  from  ’995  in  the  first  row  when  the  percentage  is  unity,  to  5*000 
when  the  percentage  is  50 ; but  relatively  to  the  percentage  itself  it 
steadily  decreases.  Twice  the  standard  error  when  the  percentage  is  3 
gives  values  3 + 3‘4  for  the  limits,  wdiich  wall  be  exceeded  once  in  every 

22  times,  while  the  same  limits  when  the  percentage  is  40  are  40  + 9‘798. 
The  first  instance  tells  us  little;  while  the  last  suggests  that  a severe 
mortality  must  be  the  rule. 

24.  It  is  also  to  be  noted  that  very  large  numbers  give  little  more 
certainty  than  more  moderate  numbers.  Considering  the  last  two  rows  in 
Table  VIII.,  it  is  seen  that  the  standard  error  is  only  I’educed  by  about 
15  per  cent,  wdien  the  mortality  is  1 per  cent.,  and  about  29  per  cent, 
when  the  mortality  is  50  per  cent,  as  the  numbers  increase  from  5,000  to 


10,000. 


Table  IX. — Showing  the  Humber  of  Deaths  in  each  Series  of  200  or  JfOO 

Cases. 


Sen  riel.  Fever,'* 

Scarlet  Fever,* 

Enteric  Fever, t 

Diphiliena,t 

Belvidere. 

Ruchill. 

Belvidere  and  Ruchill. 

Ruchill. 

Even. 

Odd. 

Even. 

Odd. 

Even. 

Odd. 

Even. 

Odd. 

15 

7 

12 

11 

43 

30 

13 

16 

21 

24 

12 

16 

40 

33 

21 

13 

19 

22 

10 

4 

38 

27 

20 

22 

15 

18 

IB 

39 

55 

9 

17 

* Each  tlgure  is  the  deaths  in  400  caneH.  f Each  H?ure  is  the  deaths  in  ’iOO  cases. 

I Each  flgure  is  the  deaths  in  300  cases. 
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25.  One  more  table  is  given  to  sliow  how  the  death-rate  may  actually 
vary  in  fairly  large  numbers.  In  each  instance  400  or  200  cases  are 
compared  with  400  or  200  parallel  cases,  the  first  the  even  number  in  the 
registers  and  the  second  the  odd ; the  differences  are  surprising.  200,  in 
the  case  of  enteric  fever  or  of  diphtheria,  is  a large  number ; 400,  in  the 
instance  of  scarlet  fever,  not  a small  number,  yet  had  any  treatment  been 
the  subject  of  investigation  and  the  alternate  cases  taken,  the  one  for 
treatment  and  the  other  for  control,  very  erroneous  conclusions  could 
easily  have  been  advanced. 

26.  One  more  formula  may  be  given  without  discussion.  It  is  the 
standard  error  of  the  mean.  The  proof  involves  principles  not  discussed 
in  this  paper,  but  the  result  can  easily  be  understood.  If  li  be  the  mean 
of  the  number  of  observations  and  cr  the  standard  deviation  of  these 
observations,  A and  <r  being  calculated  as  described  in  par.  11,  then  tlie 

cr 

standard  error  of  the  mean  = / ~ when  m is  the  total  number  of 

V 

observations. 

This  signifies  that  if  we  are  comparing  the  means  of  two  series  of  ob- 
servations, no  conclusion  can  be  deemed  even  moderately  definite  unless 

the  differences  between  the  two  means  is  g-reater  than 

o 

ir. 

In  the  proof  of  the  standard  error  given  in  par.  20  an  assumption  was 
made,  namel}’",  that  the  proportions  of  the  sample  which  had  been  found  by 
random  selection  might  be  considered  as  equivalent  to  those  existing  in  the 
general  population.  Now  in  this  case  all  the  information  in  our  possession 
can  be  stated  mathematically  by  saying  that  an  event  has  happened  »n 
times,  and  failed  n times.  From  this  the  probable  constitution  of  the 
universe  must  be  deduced.  This  problem  was  first  considered  by  Bayes, 
and  the  solution  is  known  as  Bayes’  Theorem.  The  proof  is  difficult,  but 
the  formula  is  easily  understood.  If  m deaths  and  n recoveries  have  taken 
place,  the  different  populations  from  which  these  may  have  been  drawn 
have  the  relative  probabilities  given  by  the  areas  of  the  successive  strips 
of  the  curve. 

y ---  a’™  (1 — ,r)“ 

or  if  the  total  chaiace  be  denoted  by  unity,  the  chance  of  each  type  of 
population  existing 

.r“  (1— 

{I -xfdj; 


2(t 

\/m 
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For  practical  purposes  the  ordinates  at  any  points  rouglily  give  the 
relative  probabilities.  Thus,  if  3 represent  the  number  of  deaths  found  in 
100  cases,  and  97  the  number  of  recoveries,  tlie  probabilities  that  the 
general  population  possesses  1 per  cent.,^  per  cent.,  5 per  cent,  death-rates, 
etc.,  are  given  by  substituting  these  values  for  m,  n and  x respectively  in 
the  above  formula,  and  are  : — 

(•01)3  (i_.oi)97 

(•02)3  (l_-02)37 

( 03)3  (1- -03)3 7,  etc. 

The  figures  are  given  in  the  adjoining  table  (Table  X).  For  com- 
parison the  values  obtained  by  the  same  formula  where  ?n  = 30  and  ?i  = 970 
(figures  which  express  the  same  death-rate  in  a larger  number  of  cases) 
are  arranged  in  a parallel  column  : — 

Table  X. — Showing  the  Chances  of  Each  Constitution  of  the  Population 
xcheu  the  Sample  Contains  {1)  3 of  one  kind  (a)  to  97  of  the  other  (b) 
and  (2)  30  of  (a)  to  970  of  (b). 


Percentage  of  (a)  in 
the  Constitution  of 
the  Sample. 

(1) 

Belative  Siz®  of 
Urchnatea  of 
{1 — x)97 

(2) 

Belative  ;izn  of 
OidiaatOB  of 
a;30  O — X)  970 

0 

•70 

•000 

1 

3-77 

•ooo 

1-5 

— 

•C83 

2 

11-28 

3-342 

2-5 

— 

18-648 

3 

14’08 

30-235 

3-5 

— 

20711 

4 

12  37 

10-085 

4-5 

— 

3-135 

5 

8 03 

•22S 

6 

6-35 

•002 

7 

3 01 

— 

8 

102 



9 

•78 



10 

•30 



11 

•16 

— 

12 

•07 

— 

It  is  seen,  in  the  first  instance,  that  a population  constituted  so  as  to 
possess  a four  per  cent,  mortality  is  about  equally  as  probable  as  one 
constituted  to  possess  a three  per  cent,  mortality,  and  that  a two  per  cent, 
mortality  is  only  a little  less  probable.  It  is  also  evident  that  populations 
with  mortalities  of  five  and  six  per  cent,  will  occur  once  in  every  seven 
and  twelve  times  respectively.  Little  can  therefore  be  surmised  concern- 
ing the  constitution  of  the  population  from  information  based  on  one 
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hundred  observations.  When  one  thousand  cases,  however,  are  considered, 
the  range  is  much  less,  as  is  seen  in  the  table.  The  probability  rises  to 
nearly  |rds,  but  even  here  a population  with  a four  per  cent,  constitution 
will  occur  more  than  once  for  every  three  times  the  sample  represents  the 
population  accurately. 

This  problem  is  distinctly  different  from  that  considered  in  the  previous 
pages.  It  seeks  to  find  the  constitution  of  the  general  population  from  the 
sample,  that  previously  considered  to  find  the  probable  constitution  of  the 
sample  when  the  type  of  the  population  is  known.  The  standard  deviations 
are  therefore  different.  In  the  present  instance 

2_(m-l-l)  (n  + l)  {m-\-nV 

which  is  larger  than  the  corresponding  value  of  the  standard  error  of  a 
sample,  most  markedly  so  when  m and  n are  small,  but  very  closely 
approximating  when  the  values  are  greater.  Thus,  for  m — ?>  n = ‘J7  the 
values  of  the  two  standard  errors  are  1’91  and  1‘71  respectively,  while 
for  m = 30  ?i  = 970  the  corresponding  figures  are  5‘46  and  5'39. 

With  100  cases  the  limits  given  by  twice  the  standard  error  are  3 + 
3'8,  with  1,000  cases  30  + 10‘92,  giving  a range  of  percentage  0 to  6‘8  in 
the  former  case  and  1’9  to  4T  in  the  latter.  Such  are  the  limitations  of 
the  assumption  on  which  the  proof  given  in  par.  20  depends. 

The  two  theorems  may,  however,  be  combined.  The  proof  is  given  by 
Prof.  Pearson  in  the  Philosophical  Magazine,  Mar.,  1907  : the  results  only 
concern  us  here. 

If  m and  n are  the  numbers  of  each  kind  found  in  the  sample,  and 
if  the  next  sample  number  q,  then  the  standard  deviation  with  samples 
of  number  q is  given  by 

^2 (»^+l)  (n+1)  (?n-fu  + ;7+2) 

{rn-\-n-\-'2f  (/n-t-a  + 3) 

If  q = m -\-  n,  i.e.,  if  the  standard  deviation  of  m or  n in  samples 
numbering  m + n is  desired,  then 

(?n-|-l)  Oa  + 1)  (m-|-?i-|*l) 

which  is  approximately  equal  to  ^ - if  m and  n be  large,  or  a formula 

similar  to  Poisson’s  is  arrived  at,  though  the  two  are  not  really  comparable, 
as  they  have  been  obtained  on  quite  different  premises. 

If  q be  unequal  to  m-f-n,  but  both  numbers  large,  the  formula  becomes 

2_  2 PPP  ( 1 ,11 

^ (//<-)-?<)“  \m-\-n'  q\ 
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This  shows  that  if  m-\-n  be  small  compared  with  q,  the  standaixl 
deviation  does  not  become  smaller  with  the  increase  of  q,  or  we  cannot 
predict  a lar^e  from  a small  sample,  but  only  the  opposite,  the  latter 
reason  explaining  why  the  standard  deviations  obtained  experimentally  in 
the  earlier  part  of  this  paper  (par.  21)  are  in  so  close  accord  with  the 
ordinary  theory. 

The  formnlm  in  this  section,  however,  are  those  which  should  be  used 
to  check  the  validity  of  conclusions  drawn  from  figures. 

An  example  will  make  this  easier  to  understand.  Let  the  first  series 
of  cases  be  100,  namely,  40  of  one  class  and  60  of  the  other,  and  let  it  be 
desired  to  find  the  standard  error  in  a second  series  of  1,000  cases. 
m ~ 40,  n ~ 60,  q — 1,000 

1.0002  X 40  X 60  ( 1 1 ) 

— 1002  1 100  + 1,000  I 

= 2,640 
(T  = 51-4 

In  1,000  cases  400  cases  are  to  be  expected  if  the  proportions  of  the 
first  sample  are  preserved  so  that  400  + 51 '4  X 2 are  the  limits,  as  before 
described.  Were  there  1,000  cases  in  the  first  sample  with  600  of  one 
type  and  400  of  the  other  the  standard  error  would  be  given  by 


1,0002  X 400  X 600 

5 J 

1 1 

1,OOU2 

1 1,000 

1,000 

=z  480 
or  cr  — 2L9 

or  the  standard  error  is  less  than  one-half  of  that  in  the  previous  instance. 

Apart  from  tables  these  formulaa  are  of  difficult  application,  as  the 
distributions  are  so  markedly  skew  or  asymmetrical,  the  variation  in  one 
direction  being  much  greater  than  in  the  other,  and  the  mean  not  giving 
the  most  probable  value.  Actual  calculation  in  individual  cases  is  very 
laborious. 

III. 

The  errors  due  to  random  selection  of  the  population  have  been  fully 
discussed,  but  thei’e  is  yet  one  other  type  of  error  which  is  not  usually  given 
sufficient  weight  in  actual  statistical  work,  that  is,  the  error  due  to  imper- 
fection of  technique.  This  appears  in  a variety  of  ways,  in  the  case  of  the 
astronomer  in  the  slight  difference  between  one  observation  and  anothei’, 
in  the  case  of  the  marksman  in  the  number  of  inners  or  outers  he  makes 
in  comparison  with  the  number  of  bulls  eyes.  All  human  measurements 
are  liable  to  a certain  amount  of  error  variable  in  different  individuals,  but 


Here 
so  that 

or 
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in  tlie  end  more  or  less  describable  in  eacli  separate  individual  by  some 
definite  law. 

Public  health  statistics  seem  at  first  sight  comparatively  free  from 
such  errors.  We  have  a certain  number  of  deaths  and  each  of  these  repre- 
sents a fact.  But  even  weekly  death-rates  are  far  from  certain  apart 
altogether  from  the  random  selection  of  deaths  in  each  week.  As  five  days 
are  allowed  for  the  certification  of  a death,  a few  wet  days  at  the  end  of  a 
week  may  throw  many  certifications  from  one  week  into  the  next.  Even 
such  a comparatively  simple  matter  as  the  number  of  deaths  from  a disease 
has  a large  experimental  error.  Out  of  the  2,1*60  cases  of  scarlet  fever 
which  were  treated  in  Ruchill  Hospital  as  before  described,  82  deaths  oc- 
curred. Of  these,  5 could  not  definitely  be  ascribed  to  the  fever,  occurring 
associated  with  conditions  which  themselves  were  not  likely  to  be  fatal,  but 
which  the  double  disease  made  specially  dangerous.  This  is  a fair  number 
as  the  standard  error  of  82  deaths  in  2,960  cases  is  8 '9,  so  that  the 
experimental  error  is  more  than  half  of  this.  On  such  large  numbers, 
however,  it  may  be  neglected.  In  the  groups  each  of  200  cases,  however, 
it  appears  in  the  following  manner  and  might  easily  lead  to  false  reasoning. 

Total  Deaths.  Experimental  Error. 


8 1 

11  1 

4 1 

3 1 

6 1 


I feel  convinced  that  this  is  an  under  and  not  an  over  estimate. 

In  some  examples  which  have  been  before  me  lately  in  another  branch 
of  science,  the  experimental  error  is  for  each  separate  observation  very 
large,  and  this  case  is  worth  considering.  If  we  have  a limited  number 
of  observations,  each  of  which  we  know  is  open  to  a large  experimental 
error,  we  may  consider  the  matter  in  this  way.  Let  the  observations  have 
values  x^  . . . Xa,  let  the  mean  be  h. 

So  that  h = {x^  + ‘^’2  + A’s  + 

let  the  standard  deviation  of  each  term  be  o-,  and  let  it  be  required  to  know 
the  standard  deviation  of  h. 

Tins  is  found  to  be  the  same  result  as  that  given  in  par.  26  : 

■\/  ni 

0-,  however,  is  not  the  “standard  error”  due  to  random  sampling,  but  the 
“standard  error”  of  the  experimental  eiTor,  the  result  being  derived  by  a 
different  pi’ocess  of  reasoning.  It  is  easily  seen  that  a difference  exists,  for 
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if  we  have  two  groups  of  quantities  with  the  same  disposition  in  statistical 
series,  one  which  can  be  measured  exactly  and  the  other  only  inexactly,  the 
error  of  the  mean  from  I'andom  sampling  will  be  the  same  in  both  cases, 
but  the  experimental  error  will  greatly  differ.  This  part  of  the  subject 
I intend  to  develop  more  fully  later  when  1 am  in  possession  of  the 
requisite  public  health  data. 

IV. 

The  theorem  of  this  section  is  due  to  Prof.  Pearson  and  furnishes  a very 
useful  criterion  as  to  whether  groups  of  statistics  really  fulfil  certain  con- 
ditions with  reasonable  probability.  The  proof  of  the  theorem  is  very 
difficult  and  need  not  even  be  outlined,  but  the  application,  given  a table 
of  the  function,  is  very  easy.  The  method  of  calculation  is  as  follows  : — 
If  there  be  an  actual  distribution  and  a theoretical  distribution,  the  differ- 
ence of  the  values  actual  and  theoretical  of  each  term  is  to  be  taken, 
squared  and  divided  by  the  corresponding  theoretical  value.  These  values 
are  then  summed.  This  sum  is  denoted  by  A table  of  the  function 
is  then  consulted.  In  this  the  value  of  P is  tabulated  according  to  the 
values  of  x^  and  of  n the  number  of  terms  compared,  P being  the  proba- 
bility that  in  a certain  number  of  trials  a worse  fit  than  the  theoretical 
values  will  be  found.  In  paragraph  19  two  examples  have  been  given. 
In  Table  V.  we  find  that  x"  = 2'64.  The  number  of  terms  compared  is 
seven.  We  then  consult  the  table  and  find  X"~2  gives  P = -920  and  x'^ 
= 3,  P = *809  whence  x"  = 2-GI  gives  approximately  1’  = '849,  or  in  849  ran- 
dom trials  out  of  one  thousand  a worse  fit  between  theory  and  observation 
would  occur.  In  the  second  sample  x^  = 3’07.  Hence  the  value  of  P = ‘8 
approximately,  or  in  8 trials  out  of  10  a worse  result  would  be  found. 
In  other  words,  theory  and  observation  are  in  good  correspondence. 

A third  example  is  taken  from  one  of  my  old  hospital  reports,  1903-4. 
The  question  to  be  ascertained  was  if  there  was  any  special  day  or  series 
of  days  on  which  children  sickened  from  scarlet  fever.  The  days  of  the 
week  on  which  907  children  at  school  ages  sickened  during  the  months  of 
August  and  September,  1901-1904  were  tabulated.  These  months  were 
chosen  as  being  the  epidemic  months,  and  also  the  months  immediately 
after  the  holidays,  when  many  susceptible  children  go  to  the  school  for 
the  first  time.  If  there  be  any  special  evidence  of  school  infection,  it 
should  be  seen  in  a variation  in  the  numbers  sickening  on  different  days, 
as  the  schools  do  not  meet  on  either  Saturday  or  Sunday.  The  figures  are 
given  in  the  adjoining  table  in  which  also  the  application  of  the  method 
is  shown.  The  theoretical  value  to  be  tested  here  is  obviously  the  mean 
number  of  all  the  cases  namely  | X 907  or  129'G. 
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Table  XL — Shoiving  Days  of  Sickening  in  907  Cases  of  Scarlet  Fever. 


No.  of  Cases. 

Thf^oretical 

Value. 

Difftrenoe. 

(Difference)2 

(Differenee)'i 

Theoretical 

Value. 

Sunday 

124 

129-6 

-5-6 

31-36 

•24 

Monday 

143 

129-6 

13-4 

179-56 

1-38 

Tuesday 

117 

129-6 

-12-6 

158-76 

1-22 

Wednesday  ... 

134 

129-6 

4-4 

19-36 

•15 

Tliursday 

120 

129'6 

-9-6 

92-16 

•71 

Friday 

143 

129-6 

13-4 

179-56 

1-38 

Saturday 

128 

129-6 

-3'6 

12-96 

•10 

Total 

907 

907 

ouo 

673-72 

5-18 

Thus  = 5T8  oi’  P = *522  or  in  half  the  trials  made  as  much 
divergence  would  be  found,  so  that  there  is  little  evidence  from  this  source 
that  scarlatina  is  spread  by  schools. 

As  tables  of  this  function  are  rather  inaccessible,  I have  constructed  a 
short  table  from  a diagram  in  my  note-book.  It  does  not  profess  to  be 
more  than  a first  approximation,  and  it  is  constructed  on  this  principle. 
If  the  probability  be  less  than  T it  is  of  not  much  practical  public  health 
use.  The  probabilities  have,  therefore,  been  given  in  the  top  line  in 
values  from  T,  ‘2,  • • • to  ‘9.  The  number  of  instances  compared  is  in  the 
first  verticle  column.  The  value  of  which  gives  each  of  these  values, 
is  tabulated.  We  can  then  see  at  a glance  the  probability  of  the  result. 
As  the  value  of  x^  and  the  value  of  n are  known,  the  probability  can  at 
once  be  placed  between  two  adjacent  decimals  of  unity,  which  is  quite 
close  enough  for  all  practical  purposes. 


Table  XII. — Showing  the  Values  of  for  certain  Values  of  P and  n. 


N 

Values  of  P. 

•9 

•8 

•7 

•6 

•5 

•4 

•3 

*2 

•1 

3 





-7 

1-0 

1-2 

, 1-8 

2-4 

3-4 

4-6 

4 

— 

1-00 

1-4 

1-9 

24 

2-9 

3-6 

4-6 

6-1 

5 

1-0 

1-7 

22 

2-8 

34 

4-1 

49 

5-9 

7*7 

6 

T6 

2-3 

3-0 

3-6 

4-3 

5-1 

6-0 

7-3 

9-2 

7 

2-2 

3-0 

3-8 

4-6 

5-3 

6-2 

7-2 

8-6 

10-5 

8 

2-9 

38 

4-7 

5-4 

6-3 

7-3 

8-4 

9-7 

12-0 

9 

3-5 

4-6 

5-5 

6-4 

7-4 

8-4 

9-5 

. 11-0 

13-2 

10 

4-2 

5-4 

6-4 

7-4 

8-4 

9-4 

10-6 

12-2 

14-6 

12 

5-6 

7-0 

8-2 

9-3 

10-4 

11-5 

12-8 

14-5 

17-2 

14 

7-0 

87 

9-9 

11-1 

12-3 

13-6 

15-1 

16-9 

19-6 

16 

8 6 

10-3 

11-8 

13  0 

14-4 

15-7 

16-3 

19-3 

22-3 

